We are looking for the best player available for our team. We need a central position player with main offensive qualities. Defensive qualities are secondary, as we already have one of the best defences in the league.
Some details about the data, let’s review the specific columns when you open the dataset:
Athlete = Names of individual athletes in the dataset. These names are in the format “Surname, Initial”
Statistic = These are different netball stats captured for each player.
Team = This column refers to one of the eight teams the athlete is playing for.
Total = Total of the statistic. For example, the total number of attempts from Zone 1, or the total number of goals, or the total number of minutes played.
For this analysis, we focused on a general analysis of the following positions: Goal defender, Wing defender, Centre, Wing attack, Goal attack.
NB : In order to fill the lack of data concerning the specific positions of the players (moreover, I do not know this sport at all), we assume in this analysis that all the central positions are one unique position category.
It is necessary to express the playing time data in order to make a relevant comparison.
Activité Offencive : cumul of disposals season divided by minutes played.
Activité Defencive : cumul gains (defections + interceps) divided by minutes played.
Postes : Who he is ? Goal Shooter = 1, Goal Keeper = 2, Other Positions = 0, (filter positions).
# Loading packages :
library(flexdashboard)
library(tidyverse)
library(viridis)
library(ggplot2)
library(plotly)
library(readxl)
library(DT)
library(stats)
library(factoextra)
# Import Raw Data :
DataSet <- read.csv("Data/Dataset3_Assessment3.csv")
# Import Characterization Table .xls :
Variable_ID <- read_excel("Data/Variables_Details.xlsx")
# Cleaning of the DataSet :
Clean_Df <- DataSet %>% mutate(Statistic = str_replace(Statistic,"attempt_from_zone1","attempts1"),
Statistic = str_replace(Statistic,"attempt_from_zone2","attempts2"),
Statistic = str_replace(Statistic,"goal_from_zone1","goal1"),
Statistic = str_replace(Statistic,"goal_from_zone2","goal2")) %>%
filter(!Total == 0)
# Merge two Dataframe in One fusion :
Df_Details <- merge(Clean_Df, Variable_ID, by.x ="Statistic", by.y ="Variables",all = TRUE)
# Remove Na because some Statistic variable aren't present in DataSet :
Df_Details <- Df_Details %>% drop_na() %>%
# Who is a shooter or not ? Goal Shooter =1, Goal Keeper = 2, Other Positions = 0
mutate(Poste_played = ifelse(Statistic =="goalAttempts" & Total > 0,1,
ifelse(Statistic =="blocks" & Total > 0,2,0)))
# Activity Variables in a Dataframe filtered :
Activity_DF <- Df_Details %>% filter(Statistic == "minutesPlayed"| # Relative
Statistic == "disposals"| # Attaque +
Statistic == "badPasses"| # Attaque -
Statistic == "interceptPassThrown"| # Attaque -
Statistic == "deflections"| # Defense +
Statistic == "gain"| # Defense +
Statistic == "obstructionPenalties"|
Statistic == "intercepts") # Defense +
# Pivot Table to observe the dependence between variable and choose the good one for this analysis :
Activity_DF <- Activity_DF %>% pivot_wider(names_from = Statistic, values_from = Total) %>%
group_by(Athlete,Poste_played) %>%
replace(is.na(.), 0) %>%
summarise(minutesPlayed = max(minutesPlayed),
disposals = max(disposals),
Bad_Disposal = sum(badPasses+interceptPassThrown),
deflections = max(deflections),
obstructionPenalties = max(obstructionPenalties),
gain = max(gain),
intercepts = max(intercepts),
Gain_corrected = sum(deflections + intercepts),
# e.g, you can have 2 disposal but 4 Bad, that's mean the calculation of disposal don't count bad tentative, so Full_Disposal :
Full_Disposal = sum(Bad_Disposal+disposals),
Efficiency_Att = round((Full_Disposal - Bad_Disposal)/Full_Disposal,2),
Full_Deff_Actions = sum(obstructionPenalties+Gain_corrected),
Efficiency_Def = round((Full_Deff_Actions - obstructionPenalties)/Full_Deff_Actions,2)) %>%
mutate(Def_Act = round(Gain_corrected/minutesPlayed*4*15,1),
Off_Act = round(disposals/minutesPlayed*4*15,1))
# We fixe a thresold of 3match for this analysis to have a more representative sample :
Activity_DF <- Activity_DF %>% filter(minutesPlayed>180)
# Gain and Gain_corrected are different ?
library(ggpubr) # package to add regression label on the chart
ggplot(data = Activity_DF, mapping = aes(x = gain , y = Gain_corrected))+
geom_point() +
geom_smooth(method=lm) +
stat_regline_equation(label.x=15, label.y=150) + # allowed to show regression equation
stat_cor(method = "pearson", label.x = 15, label.y = 140)+ #R2 idem
labs(title = "Graph 0 : Correlation between 'Gain' variable and 'Corrected Value'")
# Profil - Comparaison demande poste en activities offensive et def :
library(scales)
# Fonction Kmeans clustering :
library(stats)
# because scale can't take df with non numeric variable, we have to create new df :
df <- data.frame(Activity_DF$Def_Act, Activity_DF$Off_Act) %>% scale()
#We need to scale df because ifnot we can't use kmeans fonction:
kn3 <- kmeans(df, centers = 3, nstart = 25) # I chose 3 clusters
df2 <- data.frame( Athlete_Name = Activity_DF$Athlete,
Clusters =kn3$cluster,
Activity_Def = Activity_DF$Def_Act,
Activity_Off = Activity_DF$Off_Act,
Efficiency_Def = Activity_DF$Efficiency_Def,
Efficiency_Att = Activity_DF$Efficiency_Att)
# Plots :
p1 <- ggplot(data = df2, mapping = aes(x=Activity_DF$Def_Act,y=Activity_DF$Off_Act)) +
geom_point() +
geom_label(mapping = aes(label = Activity_DF$Athlete))+
labs(y = "Offensive Activity (Actions per 60min = 1game)", x = "Defencive Activity (per 60min = 1game)") +
labs(title = "Graph 1 : Offensive and Defensive Activity ")
ggplotly(p1)
# Special plot with the package factoextra, to draw cluster on the graph:
fviz_cluster(kn3, data = df,palette = "Set2", ggtheme = theme_minimal()) +
labs(title = "Graph 2 : Athlete Clustering - Offensive and Defensive Activity ",
subtitle = "Clustering by Kmeans",
caption = "This Chart highligth 3 categories : Defencive domination, Offensive Domination and Mixt Profil.") # Title design
NB : the activity data is expressed in number of offensive or defensive actions per match unit. The metrix was expressed in relation to the athletes time played and expressed in count per match (4*15min) in order to facilitate the understanding of the values.
Graph 0 : in view of the differences between the calculated “Gain” variable and the “Corrected Gain” variable which follows the basic evaluation calculation (sum of passes and shots). It seems relevant to consider the “Corrected gain” variable for the further analysis.
Graph 1 : shows the groupings of athletes with dominantly defensive, dominantly offensive or mixed activities.
Graph 2 : classifies the different athletes into three distinct clusters using the Kmeans method. The data appears to be grouped as “mixed activity”, “attack dominated activity”, “defense dominated activity”
NB : the most interesting profile for our recruitment is therefore the profile named “attack dominated activity”!
Profitability is the balance between the amount of work the player does and his efficiency. Having a player with a lot of game action but little efficiency is not interesting for us. Similarly, if a player is very efficient but does not do much in the game, then he does not affect the dynamics of the game.
# Creat Top3 filter on the offencive activity variable:
df3 <- df2 %>% group_by(Clusters) %>% mutate(top3_thresold = ifelse(rank(-Activity_Off)<=3,"TRUE","FALSE"))
library(formattable) # package to design table
formattable(df3, align =c("l","c","c","c","c","c"),
Athlete_Name = formatter("span", style = ~ style(color = "black",font.weight = "bold")),
# Colors choose for colour-blind person
list(Efficiency_Def = color_bar("#00FF00"),
Efficiency_Att = color_bar("#03fc77")))
| Athlete_Name | Clusters | Activity_Def | Activity_Off | Efficiency_Def | Efficiency_Att | top3_thresold |
|---|---|---|---|---|---|---|
| Aiken, R | 2 | 0.6 | 56.8 | 0.40 | 0.98 | FALSE |
| Altmann, K | 2 | 1.5 | 28.9 | 0.89 | 0.90 | FALSE |
| Anstiss, J | 3 | 2.2 | 6.0 | 0.70 | 0.88 | FALSE |
| Aryang, S | 3 | 5.5 | 0.4 | 0.76 | 0.33 | FALSE |
| Austin, K | 2 | 3.1 | 49.0 | 0.75 | 0.95 | FALSE |
| Bassett, C | 2 | 0.2 | 48.7 | 0.11 | 0.99 | FALSE |
| Bragg, M | 3 | 4.7 | 2.2 | 0.83 | 0.59 | FALSE |
| Brandley, A | 3 | 4.0 | 2.0 | 0.52 | 0.80 | FALSE |
| Browne, K | 2 | 0.8 | 29.9 | 0.45 | 0.94 | FALSE |
| Browne, M | 2 | 2.6 | 28.0 | 0.78 | 0.86 | FALSE |
| Bruce, C | 1 | 10.8 | 6.0 | 0.72 | 0.82 | FALSE |
| Cassidy, M | 2 | 1.5 | 42.0 | 0.86 | 0.95 | FALSE |
| Charles, V | 2 | 1.8 | 35.2 | 0.72 | 0.94 | FALSE |
| Colyer, I | 2 | 1.3 | 28.6 | 0.69 | 0.94 | FALSE |
| Cosh, E | 2 | 1.1 | 31.9 | 0.30 | 0.94 | FALSE |
| Craig, S | 3 | 3.3 | 1.4 | 0.49 | 0.80 | FALSE |
| Dehaney, K | 1 | 7.8 | 6.6 | 0.46 | 1.00 | TRUE |
| Dunkley, L | 2 | 3.0 | 34.2 | 0.89 | 0.96 | FALSE |
| Dwan, T | 2 | 2.0 | 43.1 | 0.76 | 0.95 | FALSE |
| Eddy, K | 3 | 3.6 | 1.1 | 0.65 | 0.62 | FALSE |
| Fowler, J | 2 | 0.7 | 63.7 | 0.43 | 1.00 | TRUE |
| Francis, S | 3 | 3.9 | 1.1 | 0.75 | 0.39 | FALSE |
| Garbin, S | 2 | 2.6 | 62.0 | 0.93 | 0.95 | TRUE |
| Gardner, M | 2 | 1.9 | 32.4 | 0.67 | 0.93 | FALSE |
| Garrett, M | 3 | 3.9 | 1.1 | 0.45 | 0.44 | FALSE |
| Glasgow, S | 2 | 1.6 | 37.9 | 0.71 | 0.96 | FALSE |
| Guscoth, L | 3 | 4.6 | 0.8 | 0.52 | 0.64 | FALSE |
| Hadley, P | 2 | 2.7 | 23.6 | 0.66 | 0.92 | FALSE |
| Harten, J | 2 | 1.9 | 48.3 | 0.80 | 0.96 | FALSE |
| Hay, M | 2 | 2.7 | 36.0 | 0.88 | 0.94 | FALSE |
| Haythornthwaite, N | 2 | 1.2 | 35.9 | 0.71 | 0.94 | FALSE |
| Hinchliffe, T | 3 | 5.8 | 4.8 | 0.62 | 0.95 | FALSE |
| Horjus, G | 2 | 0.7 | 40.6 | 0.80 | 0.93 | FALSE |
| Housby, H | 2 | 3.2 | 42.4 | 0.88 | 0.94 | FALSE |
| Jenner, K | 1 | 7.0 | 0.7 | 0.71 | 0.69 | FALSE |
| Jovic, M | 3 | 3.5 | 21.6 | 0.76 | 0.89 | TRUE |
| Klau, S | 1 | 8.9 | 6.1 | 0.67 | 0.93 | FALSE |
| Koenen, C | 2 | 0.7 | 47.3 | 0.67 | 0.98 | FALSE |
| Kumwenda, M | 2 | 1.0 | 48.2 | 0.75 | 0.98 | FALSE |
| Langman, L | 2 | 2.8 | 32.6 | 0.78 | 0.94 | FALSE |
| Lewis, O | 3 | 4.9 | 6.0 | 0.60 | 0.85 | FALSE |
| Mannix, E | 1 | 8.5 | 6.7 | 0.80 | 0.94 | TRUE |
| Manu’a, K | 3 | 4.0 | 1.8 | 0.51 | 0.69 | FALSE |
| Maweni, P | 1 | 7.8 | 5.7 | 0.65 | 0.99 | FALSE |
| McAuliffe, M | 3 | 4.2 | 5.4 | 0.83 | 0.85 | FALSE |
| Mentor, G | 1 | 9.6 | 5.8 | 0.76 | 0.96 | FALSE |
| Mi Mi, J | 2 | 1.6 | 29.9 | 0.85 | 0.91 | FALSE |
| Moloney, K | 2 | 2.4 | 25.1 | 0.76 | 0.96 | FALSE |
| Moore, L | 3 | 5.5 | 1.7 | 0.60 | 0.88 | FALSE |
| Nankivell, M | 3 | 3.5 | 24.1 | 0.62 | 0.95 | TRUE |
| Nelson, S | 2 | 0.6 | 44.7 | 0.67 | 0.99 | FALSE |
| Parmenter, A | 1 | 7.6 | 3.1 | 0.83 | 0.69 | FALSE |
| Petty, H | 2 | 3.5 | 27.0 | 0.67 | 0.89 | FALSE |
| Philip, T | 2 | 0.5 | 42.4 | 0.50 | 0.95 | FALSE |
| Pitman, C | 2 | 1.3 | 34.4 | 0.68 | 0.90 | FALSE |
| Poolman, S | 3 | 5.7 | 7.1 | 0.67 | 0.99 | FALSE |
| Potgieter, L | 2 | 0.9 | 41.1 | 0.52 | 0.98 | FALSE |
| Pretorius, K | 1 | 7.3 | 1.0 | 0.90 | 0.62 | FALSE |
| Price, J | 3 | 5.8 | 30.0 | 0.91 | 0.94 | TRUE |
| Proscovia, P | 2 | 1.6 | 39.1 | 1.00 | 0.98 | FALSE |
| Proud, M | 2 | 3.1 | 45.8 | 0.76 | 0.94 | FALSE |
| Russell, J | 3 | 4.4 | 5.5 | 0.95 | 0.81 | FALSE |
| Scherian, L | 2 | 0.6 | 25.5 | 0.62 | 0.89 | FALSE |
| Shimmin, K | 3 | 4.6 | 1.9 | 0.46 | 0.50 | FALSE |
| Simpson, G | 3 | 3.9 | 2.0 | 0.84 | 0.62 | FALSE |
| Sinclair, G | 2 | 1.1 | 43.4 | 0.74 | 0.94 | FALSE |
| Sterling, S | 1 | 10.4 | 6.2 | 0.70 | 0.92 | TRUE |
| Teague-Neeld, A | 2 | 2.1 | 44.3 | 0.56 | 0.95 | FALSE |
| Thwaites, C | 2 | 0.7 | 39.5 | 0.42 | 0.98 | FALSE |
| Turner, M | 1 | 8.5 | 0.4 | 0.61 | 0.50 | FALSE |
| Wallace, S | 2 | 1.4 | 58.6 | 0.82 | 0.97 | TRUE |
| Ward, J | 1 | 7.3 | 0.4 | 0.65 | 0.24 | FALSE |
| Watson, L | 2 | 2.0 | 46.6 | 0.73 | 0.96 | FALSE |
| Weston, J | 3 | 4.7 | 0.9 | 0.56 | 0.46 | FALSE |
| Wood, S | 2 | 0.9 | 48.9 | 0.73 | 0.97 | FALSE |
| van der Merwe, S | 3 | 3.9 | 2.0 | 0.83 | 0.57 | FALSE |
library(ggrepel) # package to manage label in the graph
P2 <- ggplot(data = df3, mapping = aes(x = Efficiency_Att, y = Activity_Off)) +
geom_point(aes(color = Clusters)) +
facet_wrap(. ~ Clusters,scales = "free")+
geom_label_repel(aes(label = ifelse(top3_thresold, as.character(Athlete_Name),"")))+
labs(title = "Graph 3 : Offensive Analysis - Efficiency vs Acticity ",
subtitle = "Find the best player per profil, which player is working most efficiently ?",
caption = "Performance is a balance between work and efficiency !")
P2
This graph shows the balance between the player’s activity during the season, but also his efficiency. This allows us to determine which player has done the most work efficiently. In this case, we are concerned with evaluating the quality of our players on the offensive activity.
TOP 3: Price, Nankivell and Jovic, are the best performing players in the Kmean category “Mixed activity”,
TOP 3: Garbin, Fowler and Wallace, are the best performing players in the Kmean category “Attacking dominance activity”,
TOP 3: Mannix, Delaney and Sterling, are the best performing players in the Kmean category “Defending dominance activity”.
Finally, for our team, the most relevant choices seem : Garbin, Fowler and Wallace!
In order to remain one of the best defenses in the league, it would be interesting that our player is a minimum of faculty in the defensive system…
# Creat filtred dataframe with the top players
df4 <- df3 %>% filter(Athlete_Name =="Garbin, S"|
Athlete_Name =="Fowler, J"|
Athlete_Name =="Wallace, S")
P3 <- ggplot(data = df4, mapping = aes(x = Efficiency_Def, y = Activity_Def)) +
geom_col(aes(fill = Athlete_Name)) +
scale_fill_viridis_d()+
labs(title = "Graph 4 : TOP 3 players in Attack ",
subtitle = "which one are the best defender ?",
caption = " WE HAVE TO CALL Garbin,S!")
P3
This graph shows the profitability of the top 3 players isolated as mentioned above. This allows us to determine which player has done the most work efficiently balance. In this case, we focused on the defensive activity.
This player fits our needs best. In addition to his offensive activity of 62 actions per game with an efficiency of 95%. He also knows how to be efficient in a defensive system with 3 actions per game, 93% efficiency 💪🏽.